fix semantic tokens #13

zhvng · 2023-04-19T18:54:06Z

move normalization into the hubert model. brings back part of 77ee0f4. (realized the approach discussed in Hubert args normalization #7 is not a good idea)
simplify data processing and remove redundant operations
changes to the way semantic tokens are computed in preprocessing: ~~MERT Is trained on a context window of 5 seconds (!) which might explain the deterioration in sample quality over longer sequences noticed by @Saltb0xApps in the discord~~ confirmed by the m-a-p team that MERT generalizes to longer context lengths, and still holds SOTA performance on various tasks. Will leave the option to specify a shorter hubert context length, but won't be used if not specified in config.
bin_size option to average adjacent hubert features to reduce the number of semantic tokens

…mental)

…ntic tokens

zhvng added 5 commits April 19, 2023 17:16

option to normalize inputs right before hubert (bring back part of 77…

3abd28b

…ee0f4)

remove normalization from data loading, simplify processing

5e7865f

split audio into context window before passing through hubert (experi…

4f9ce1d

…mental)

option to bin adjacent semantic features to reduce the number of sema…

be6510f

…ntic tokens

fix semantic window for inference

801d90c

zhvng marked this pull request as ready for review April 20, 2023 20:41

zhvng added 2 commits April 28, 2023 01:55

let output_hz denote frequency of output semantic tokens

64e49c1

add setup.py

3c22717

Provide feedback